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Abstract 

Genome-wide association studies (GWAS) and subsequent dense-genotyping of associated loci identified over a hundred 
single-nucleotide polymorphism (SNP) variants associated with the risk of rheumatoid arthritis (RA), type 1 diabetes (T1D), 
and celiac disease (CeD). Immunological and genetic studies suggest a role for CD4-positive effector memory T (CD + T EM ) 
cells in the pathogenesis of these diseases. To elucidate mechanisms of autoimmune disease alleles, we investigated 
molecular phenotypes in CD4 + effector memory T cells potentially affected by these variants. In a cohort of genotyped 
healthy individuals, we isolated high purity CD4 + T EM cells from peripheral blood, then assayed relative abundance, 
proliferation upon T cell receptor (TCR) stimulation, and the transcription of 215 genes within disease loci before and after 
stimulation. We identified 46 genes regulated by c/s-acting expression quantitative trait loci (eQTL), the majority of which 
we detected in stimulated cells. Eleven of the 46 genes with eQTLs were previously undetected in peripheral blood 
mononuclear cells. Of 96 risk alleles of RA, T1D, and/or CeD in densely genotyped loci, eleven overlapped c/s-eQTLs, of 
which five alleles completely explained the respective signals. A non-coding variant, rs389862 A , increased proliferative 
response (p = 4.75x10~ 8 ). In addition, baseline expression of seventeen genes in resting cells reliably predicted proliferative 
response after TCR stimulation. Strikingly, however, there was no evidence that risk alleles modulated CD4 + T EM abundance 
or proliferation. Our study underscores the power of examining molecular phenotypes in relevant cells and conditions for 
understanding pathogenic mechanisms of disease variants. 
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Introduction 

Memory T cells are an important component of the adaptive 
immune system. They circulate between lymphoid organs, blood, 
and peripheral tissues, and facilitate faster and more aggressive 
immune response to antigens after re-exposure. CD4-positive 
effector memory T (CD4 + T EM ) cells are known to migrate to 
peripheral sites of inflammation upon activation, and rapidly 
produce both Thl and Th2 cytokines [1]. Investigators have long 



suggested their involvement in autoimmune diseases including 
rheumatoid arthritis (RA), type I diabetes (T1D), and celiac disease 
(CeD) [2-5]. However, whether changes in cell population subsets 
and functions are causal or reactive to disease is uncertain. One 
strategy to answer this question is to examine potential interme- 
diate molecular phenotypes, and identify those modulated by 
genetic variants. In order to understand the pathogenic roles of 
CD4 + Tem cells in autoimmunity, we aimed to characterize the 
variation in their phenotypic and functional markers in a healthy 
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Author Summary 

Genome-wide association studies have identified hun- 
dreds of genetic variants associated to autoimmune 
diseases. To understand the mechanisms and pathways 
affected by these variants, follow-up studies of molecular 
phenotypes and functions are required. Given the diversity 
of cell types and specialization of functions within the 
immune system, it is crucial that such studies focus on 
specific and relevant cell types. Here, we studied genetic 
and cellular traits of CD4-positive effector memory T (CD4 + 
T EM ) cells, which are particularly important in the onset of 
rheumatoid arthritis, celiac disease, and type 1 diabetes. In 
a cohort of healthy individuals, we purified CD4 + T EM cells, 
assayed genome-wide single nucleotide polymorphisms 
(SNPs), abundance of CD4 + T EM cells in blood, proliferation 
upon T cell receptor stimulation, and 215 gene transcripts 
in resting and stimulated states. We found that expression 
levels of 46 genes were regulated by nearby SNPs, 
including disease-associated SNPs. Many of these expres- 
sion quantitative trait loci were not previously seen in 
studies of more heterogeneous peripheral blood cells. We 
demonstrated that relative abundance and proliferative 
response of CD4 + T EM cells varied in the population, 
however disease alleles are unlikely to confer risk by 
modulating these traits in this cell type. 

population, and to identify whether these markers intersect with 
the genetic basis for autoimmunity. 

The majority of autoimmune disease risk variants are located in 
non-coding regions of the genome. It is reasonable to hypothesize 
that a subset of them causes disease by altering gene regulatory 
mechanisms as expression quantitative trait loci (eQTL) [6-9] . So 
far, studies of gene regulation have largely been carried out in cell 
lines and primary resting blood cells including undifferentiated 
CD4 + T cells, B cells, monocytes, and dendritic cells [8,10-12]. 
However, to understand the pathogenic mechanisms of risk 
variants, especially when studying the immune system where cells 
are highly diverse and functionally specialized, it is crucial to focus 
on relevant cell types and stimulated cellular states. 

We have previously shown that genes within RA risk loci were most 
specifically expressed in CD4 + T EM cells, compared to more than 200 
other immune cell types of various lineages and developmental stages 
(p= 1.00 xlO" 8 ; Figure SI) [13]. Celiac disease and T1D loci were 
also enriched for genes specifically expressed in CD4 + Tem cells 
(p= 1.43 XlO" 5 and 1.29 xlO" 4 , respectively; Figure SI) [13]. Non- 
coding single nucleotide polymorphisms (SNPs) associated with RA 
significantly overlap chromatin marks of trimethylation of histone H3 
at lysine 4 (H3K4me3) specifically in CD4 + regulatory and memory T 
cells {p= 1.3x10"* and 7.0xl0" 4 , respectively) [14]. 

We hypothesized that the risk alleles of these conditions might 
influence CD4 + T EM quantitative molecular phenotypes: 1) the 
expression of immune-related genes; 2) the relative abundance of 
CD4 + T EM cells in peripheral blood; and 3) proliferative response 
to T cell receptor (TCR) stimulation. To this end, we undertook a 
large immunoprofiling study in a healthy population of 174 
European-descent individuals, by cross-analyzing genotype, tran- 
scription, abundance, and proliferative response in primary CD4 
Tem cells. Because the post-stimulation activation of CD4 + T EM 
cells is presumably crucial for their autoimmune response, we 
assayed cells not only at rest, but also after T cell receptor (TCR) 
stimulation with anti-CD3/CD28 beads. As such, this study is the 
first to our knowledge to map expression quantitative trait loci and 
examine immunological cellular traits in primary CD4 + T E m cells 
under multiple states. 



Using the ImmunoChip platform, investigators recently densely 
genotyped 186 loci disease that originally arose through genome- 
wide association studies (GWAS) in case-control samples for RA, 
CeD, and inflammatory bowel disease [15-17], as well as T1D 
(unpublished data). Dense genotyping allowed localization of 
association signals within these disease loci to a set of alleles that 
are very likely to be causal. Within these loci, we have a greater 
ability to identify co-localization between alleles driving variation 
in molecular phenotypes (such as eQTLs) and the disease risk 
alleles. However, in instances where multiple variants are in 
perfect linkage, we cannot pinpoint the exact causal variant 
without functional evaluation. 

Results 

The experimental protocol (Figure 1) is described in detail in 
Methods and Text SI. Briefly, we obtained peripheral blood 
mononuclear cells (PBMCs) from the whole blood of healthy 
individuals via Ficoll-Paque centrifugation, and then used mag- 
netic- and fluorescence-activated cell sorting to isolate CD4 + T EM 
cells at a high degree of purity (>90%; see Figure S2A). We 
acquired genome-wide genotype data of about 640,000 SNPs on 
Illumina Infinium Human OmniExpress Exome BeadChips [18]. 
For each individual we then measured three quantitative 
phenotypes: 1) the expression of 215 genes (see Table SI) before 
and after T cell receptor (TCR) stimulation by anti-CD3/CD28 
antibody beads; 2) the relative abundance of CD4 + T EM cells 
(CD45RA"/CD45RO + /CD62L" /low ) as a proportion of total 
CD4 + T cells; and 3) proliferation upon stimulation. Since we had 
low numbers of primary cells for expression profiling, we used the 
highly sensitive NanoString nCounter assay to avoid biases 
potentially induced by cDNA preparation. Out of the 2 1 5 genes 
assayed, 1 1 5 were within densely genotyped disease risk loci (see 
Tables S2 and S3). We quantified CD4 + T EM cell abundance 
with X-Cyt, an automated statistical method that accurately 
identifies cell populations in cytometry data [19]. 

Mapping cis-eQTLs that regulate genes in risk loci 

We first aimed to identify SNP variants that regulated 
expression of genes in cis. To best localize eQTL signals, we 
imputed 1000 Genomes variants within 250 kb from the 
transcription start site (TSS) of each gene (excluding five HLA 
genes and five long non-coding RNAs). We tested SNPs in gene- 
coding and non-coding regions in both resting and stimulated 
CD4 + T EM cells. We included gender and the top five principal 
components of the genotype data (calculated by EIGENSTRAT) 
as covariates in regression. To adjust for multiple hypothesis 
testing, we conducted 10,000 permutations within each gene 
region to calculate empirical /(-values, and then reported 
associations at a false discovery rate of 5 % . 

In total, we observed 46 genes (22.4%) with aj-eQTL signals, 
including 17 in resting cells and 43 in stimulated cells (Tables 1 
and 2, Figure 2A). For 14 of the 46 genes (30.4%), we detected 
eQTL signals in both resting (14/17, 82.4%) and stimulated (14/ 
43, 32.6%) states. In four of these 14 genes (FHL3, GRB10, 
IL18R1, and PIGCj, the lead eQTL SNPs across resting and 
stimulated states were identical. In another five genes (C1QTNF6, 
PRDM1, SKAP2, DDX6, and LYRM7), the lead SNPs are in tight 
LD (r = 0.80—1; based on 1000 Genomes Release 2, European 
samples). For the remaining five genes (BLK, TMPRSS3, CP) 101, 
ORMDL3, and GSDMB), the lead SNPs from the two states were in 
partial LD (0.42<r 2 <0.56). In these five cases, we could not be 
confident that the eQTL SNPs across stimulation states were 
tagging the same variant. 
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Figure 1. Schematic of the experimental workflow. We collected 
four types of data from each individual: 1) quality-controlled genome- 
wide SNP data containing 638,347 markers collected on lllumina 
Infinium Human OmniExpress Exome BeadChips, 2) abundance of CD4 
T EM cells as a percentage of all CD4 T cells obtained by FACS and 
quantified by X-Cyt, 3) average cell division upon T cell receptor 
stimulation by anti-CD3/CD28 commercial beads, measured using a 
CFSE (carboxyfluorescein succinimidyl ester) dye dilution assay, and 4) 
expression of 215 genes measured by NanoString nCounter. We 
repeated each proliferation assay in two-three technical replicates. Cell 
sorting purity and replication correlations for CD4 T EM abundance, 
division index, and proliferation index are shown in Figure S2. 
doi:10.1371/journal.pgen.1004404.g001 



Three genes {IL23R, PLCH2, and RGS1) had statisticaUy 
significant eQTLs exclusively in the resting state, while 29 genes 
had statistically significant eQTLs exclusively in stimulated cells, 
such as rs942793 associated with J(MI£1 expression (Figure 2B). 
One possibility is that some SNPs failed to reach significance 
threshold due to the small sample size or low expression levels in 
resting cells. However, we observed many genes with truly state- 
specific eQTLs, where the estimated effect sizes (fi) of the eQTL 
SNP differed significantly across resting and stimulated states. To 
systematically compare the f3 lest and /? stim for each gene, we used a 
2-statistic to quantify the probability that they differ. We then 
reported the Rvalue (two-tailed) assuming that z is distributed as 
standard normal, considering p<0.05 to be significantly different 



("state-specific"; see Tables 1 and 2). For example, rsl2746918 
increased the expression of PLCH2 significantly only in resting 
cells; and j? rest was approximately twice as large as /S st i m 
(Figure 2C). We note that 1 of the 3 eQTLs in resting cells 
was state-specific (p<0.05), and 13 out the 29 eQTLs seen in 
stimulated cells were state-specific (p<0.05). Of the 14 eQTLs that 
were shared between resting and stimulated cells, only 4 of them, 
BLK (Figure 2D), CD101, PIGC, and PRDM1, had different fi'.s 
across states. The abundance of eQTLs detected exclusively in 
stimulated cells underscores the importance of studying cells in 
different cellular states. 

We wanted to assess whether the eQTLs might act by altering 
gene regulatory elements in CD4 + T EM cells. To this end we asked 
whether the eQTL SNPs co-localized with marks of active 
promoters or enhancers. We utilized H3K4me3 marks from the 
NIH Roadmap Epigenomics Mapping Consortium [20] measured 
by ChlP-seq in primary CD4 + memory T cells. For the SNP with 
the strongest association to each gene, we queried the distance of the 
nearest H3K4me3 mark to this SNP or its LD partners (r 2 >0.8). We 
compared this distance measure between two sets of SNPs: the 46 
SNPs with significant eQTL associations (FDR<5%, resting or 
stimulated), and the SNPs most strongly correlated with the other 
159 genes but did not reach significance threshold. Indeed, the 46 
significant eQTL SNPs were located at smaller distances to 
H3K4me3 marks (p = l.lOxlO -7 , one-sided Mann-Whitney test, 
Figure S3A). In addition, we queried the height of each H3K4me3 
mark's peak, which reflected the number of reads at a given position 
compared to genomic controls as defined by the MACS software 
package. A tall peak gives us confidence that the mark is present in a 
large proportion of cells. Comparing the marks nearest to the two 
sets of SNPs, we saw that the 46 eQTL SNPs were also located near 
taller peaks (p = 9.56 x 10" 8 , Figure S3B). 

Many eQTLs are CD4 + T EM cell-specific 

We compared the as-eQTLs we discovered to those found in 
heterogeneous peripheral blood mononuclear cells (PBMC) in a 
large genome-wide eQTL meta-study (n = 5,331) conducted by 
Westra et al. [8]. At 5% FDR, eleven of the 46 eQTL genes we 
identified showed no detectable signal in PBMCs at 50% FDR. 
We saw significant associations in 131 genes at 50% FDR, 53 of 
which had no signal in PBMCs at 50% FDR (Tables 1 and 2). 
We hypothesized that these genes tended to be more specifically 
expressed in CD4 + T EM cells, thus making eQTLs readily 
detectable in the purified cell population. To assess this, we 
examined cell-specific expression of the genes the ImmGen 
dataset, which assayed the genome-wide expression in 247 murine 
mouse immunological cell types [13,21]. We found that the genes 
with CD4 + T EM cell-specific eQTLs (at 50% FDR) were more 
specifically expressed in CD4 + T EM cells than genes with eQTLs 
detected in both datasets (p = 0.044, one-sided Mann- Whitney 
test). 

Autoimmune disease alleles affect the transcription of 
genes in cis 

We then focused on 1 15 genes near 96 risk alleles of RA, T1D, 
and/or CeD in densely genotyped loci (182 gene-SNP pairs, 
including two risk alleles shared by at least two diseases, see 
Tables S2 and S3). We discovered that eleven (11.4%) disease- 
associated SNPs (6 of 24 RA SNPs, 5 of 37 T1D SNPs, and 3 of 37 
CeD SNPs) correlated significantly with the expression of ten 
genes in either resting or stimulated state (Table S3). In addition, 
there was substantial enrichment of nominally significant associ- 
ations (/)<0.05) among disease SNPs. By random chance, we 
expected about nine SNP-gene pairs to reach nominal association 
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Figure 2. State-specific effects of eQTL SNPs. A) For a subset of genes, the correlation effects ((3) of the top associated SNP across resting and 
stimulated cells differed. The genes shown with a black-dotted vertical lines had significantly different effect sizes across states. Black horizontal 
segments in B)-D) denote median values. Blue panels show resting-state (normalized) expression values; red panels show stimulated expression 
values. B) rs942793 G significantly increased the expression of ZMIZ1 only in stimulated cells. C) rs12746918 T was correlated with increased expression 
of PLCH2 in resting cells only. D) rs4840565 c decreased BLK expression in stimulated cells nearly twice as much as in resting cells [/i rest (SE) = - 
0.366(0.085), ft tim = -0.805(0.071)]. 
doi:10.1371/journal.pgen.1004404.g002 



in each stimulation state. However, we observed 26 pairs (14.2%) 
with nominal association in resting cells (p = 4.67 x 10~ , one-tailed 
binomial test). Even more strikingly, we observed 45 pairs (24.7%) 
with nominal association in stimulated cells (p<\0 , one-tailed 
binomial test). 

To identify those instances where the disease-associated SNP 
could explain the entire eQTL signal in the gene region, we 
applied conditional analysis to identify any residual signals after 
controlling for the disease SNP. In five of the ten genes (BLK, 
C5orf30, GSDMB, IRF5, PLEK), conditioning on the disease SNP 
obviated any remaining eQTL signal in the region (no SNP with 
permutation /i-value <0.05; Figure 3), suggesting that there was a 
single variant (the disease-associated SNP or one in very high LD 
to it) that drove variation in expression. Interestingly, as previously 
noted, the lead SNPs in resting and stimulated states for BLK and 
GSDMB were in partial linkage to each other. The absence of 
residual eQTL signal upon conditioning on the same risk allele 
might suggest that the lead SNPs were indeed tagging the same 
causal SNP in each of these genes. In each of the other five genes 
(ORA1DL3, SKAP2, TMPRSS3, T.NFRSF14, and <MZ£i), evidence 
of independent eQTL effect remained after conditional analysis. 



In these instances the disease-associated SNP and remaining lead 
signal are in partial linkage disequilibrium (r = 0.36-0.73). In 
these cases, we could not conclude whether the disease SNPs drove 
the alteration in expression, or whether the true causal SNPs were 
in partial linkage and caused spurious associations. It is probable 
that disease risk alleles were indeed causal, yet we could not 
confidently fine-map the effect due to experimental noise in 
expression assays or inadequate sampling. 

We note that another 26 genes within disease loci associated 
contained ciy-eQTL signals, but that these ciy-eQTL signals did not 
co-localize with RA, T1D, or CeD alleles. As these loci had been 
fine-mapped using Immunochip, the lack of overlap strongly 
suggested that these a'r-eQTLs and disease-causing variants were 
distinct. For example, rs798000 is an RA risk allele located in a 
non-coding region upstream of CD2, CD58, and PTGFRN. 
However, it was not associated with the expression of any of 
these genes (p>0.5). Another example was rs6911690, an RA 
allele located about 60 kb 5' of PRDM1, that was not associated 
with the expression of the gene at rest or after stimulation (/?>0.5). 
The lead eQTL SNP associated to PRDM1 was rs578653 (FDR< 
10 ), which was not in LD with the disease allele (r 2 <0.05). 
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The genetic basis of CD4 + T EM cell proliferation 

The relative peripheral abundance of CD4 + T E m cells varied 
between individuals (mean = 9.57%; SD = 4.85%), and was 
reproducible 35 individuals with two separate blood draws more 
than one month apart (Pearson's r— 0.87, p= 1.77xl0 -11 , see also 
Figure S2B). Consistent with other studies, we observed that the 
relative proportion of CD4 + T EM cells increased with age by 
0. 1 1 % per year (p me = 1 .92 x 10~ 3 ) [22] . We also observed that on 
average men had 2.22% more CD4 T E m cells than women 
(/'gender =3. 80 xlO" 2 ; see Figure S4). Upon anti-CD3/CD28 
stimulation, there was a substantial inter-individual variation in 
proliferation measured by both division index (DI, average 
number of divisions undergone by all cells; mean = 1 .46, 
SD = 0.35), and proliferation index (PI, average number of 
divisions undergone only by dividing cells; mean = 2. 16, 
SD = 0.21). Proliferation metrics were also reproducible in the 
35 individuals (Pearson's r DI =Q.bl; Pearson's r P£ = 0.62, Figures 
S2C and S2D). Interestingly, proliferation was negatively 
correlated to the proportion of CD4 + T EM cells 
(p D /= 1.28xl0~ 3 , p PJ = 1.93xl0~ 3 ), but was not associated to 
age or gender (/>>0.3). This negative correlation needs to be 
replicated in an independent dataset. Effector functions of T EM 
cells with higher proliferative capacities need to be examined to 
understand whether they represent a hyperactive subset whose 
abundance is controlled to maintain immune homeostasis. Possibly 
individuals with a lower proportion of T EM cells are relatively 
enriched for these subsets. 

We tested genome-wide SNPs for association to relative 
abundance, division index, and proliferation index, considering 
/;<5xl0 as the threshold for significance. For abundance, we 
included gender, age, and the top five principal components of 
genotypes as covariates. Given the correlation with proliferation, 
we also included the measured CD4 + T EM relative abundance as 
an additional covariate. We observed associations to division index 
in several loci, including 13q34 led by rs389862 (p = 4.75 x 10" 8 ; 
Figure 4A). This SNP is a non-coding variant located 30 kb 
upstream of RASA3, and 70 kb upstream from CDC16. Both genes 
have known roles in regulating cell proliferation or differentiation 
[23,24]. This SNP was also strongly associated with proliferation 
index (p= 2.75x10 ). Additionally, there was a strongly sugges- 
tive association to rs3775500 on chromosome 4, located in the 
intron of DAPP1, which encodes the Bam32 protein 
(/> = 5.40x10" ; Figure 4B), which is an adaptor protein 
expressed solely in antigen presenting B cells. Interestingly, 
mutations in this gene have been shown by several groups to 
affect T cell activation [25,26], suggesting the possibility that B 
cells may indirectly regulate T cell function in autoimmunity. We 
did not observe any significant association with the relative 
abundance of CD4 T EM cells. 

When we extracted the association statistics of 118 densely 
genotyped risk alleles of CeD, RA, and/or T1D, they showed no 
inflation in association /(-values for relative abundance of CD4 + 
T EM cells (Figure 5 A, Table S2). This suggested that risk 
variants did not modify risk via modulation of CD4 + T EM 
peripheral abundance. We recognized that the power to detect 
significant associations might have been limited in our study by the 
sample size. However, this negative finding was corroborated by 
results from a recently published study with data from ~2800 
individuals, in which the same set of risk alleles also showed no 
significant association to CD4 + T EM (see Figure S5) [27]. 
Similarly, the same set of risk alleles did not show significant 
association to proliferative response (Figure 5B, Table S2). 
Based on these data, it was unlikely that SNP variants associated to 



June 2014 | Volume 10 | Issue 6 | e1004404 



Autoimmune Alleles and eQTLs in CD4 Effector Memory T Cells 



<J in 



E 



a 

0) 
T3 



o » 



0) 

c 

0) 
IN 

-Q 

IS 



rM <— i— 



vo m ai oo 



«- r- O 



vvvvvvvvvvvvvv 



Cv vO vO 



Ov CO vO ro 

fN r\| •- rp 



t- CO i- 



vO vO CO 



Lfl 




vO 


? 

un 

CO 


cm 


o 
o 


vO 

LT1 

o 


rN 


fM 


O 
ro 


un 
VO 


CO 


un 
Oi 
(N 


ro 
ro 


PO 

o\ 


vo 


ro 
un 




fN 




T£ 


vo 





v£> vO O t- 



ro vO r- 



0\ vO CO 



i — i/i rn r\l 



o 


o 


o 


o 


o 


o 


o 


o 


d 


d 


d 


d 


V 


V 


V 


V 


o 


en 


ro 


CO 




o 




o 


Ut 


LLI 

VO 




LLI 

ro 




q 


ro 




«o 


ro 


fN 


vd 


ro 


O 


fN 


ro 


vO 


CO 


O) 


vO 


d 






d 




U3 




h- 



V V V V V V V 



vO CO Ch ^ ro rN 



CO vO fN Ov 



vO vO i — lo CO 
-- m rq vO CO O 



O i- O i- 



t— t— m co 

rM T IN M 

rN ro Ci Ch 

CM vO i- 



LT) T~ T~ 



H O *° 

* o I g 

_i i- Q Q 

ca u u Q 



m <° _ 

2 % £ 

CQ Q CO 

OC W i- 

O U d 



Q 

S VJ 

a. 2 

O a. 



in 

? « " 
E a. °f 

oc * S 

CL lfl H 



a h 



a. u 



-D 3= 



f s 

S3 



£ -2> g <V 5 

- o ■£ o 

c -n 5 ^ 

-g aj aj & ro 

£ += t; ^ 

.E E J c - 

i/> t; "° , o 



PL0S Genetics | www.plosgenetics.org 



7 



June 2014 | Volume 10 | Issue 6 | e1004404 



Autoimmune Alleles and eQTLs in CD4 Effector Memory T Cells 



A 8p23-p22 locus 



Unconditional analysis 



Conditional on risk allele 



20 
10 

0 



\ • rs4840565 



jt+-+*.*>ir*<*» > , L.i. — ,: , .. j <l^ %r+7r* KU 




11.30 



11.35 



11.30 



-► FAM167A 



-f FAM167A 



C8orf12 

B 5p21 locus 




o 
o 

(/> 
If) 

(0 



a> 
o 
C 
m 
o 

'c 

O) 
CO 



PPIP5K2 
C 17q12-q21 locus 

15 



10 
5 

0 



rs1 2936409 



6409""^*'*> •• % 



• "V ^' 



38.00 



IKZF3 ZPBP2 

D 7q32 locus 

15 
10 
5 

o 



38.05 gsdmB 38 10 38 15 3800 

< M • ♦ •— ► » > •— ► 

ORMDL3 I RRC3C GSDMA PSMD3 IKZF3 ZPBP2 




ORMDL3 LRRC3C GSDMA PSMD3 



• / , Vs3807306 



128.50 KCP 
ATP6V1F < 




LOC 100 130705 

E 2p14-p13 locus 



LOC100130705 



.. ' . M rs 10167650 

•| * *•** • 






..* • • 







CNRIP1 

Position (MB) 



Figure 3. Five disease risk alleles explained the eQTL associations with five genes. The left-sided panels show unconditional SNP- 
expression association results. Green dashed lines mark the TSS of the eQTL gene. The red dots indicate the risk alleles associated with the expression 
of respective genes shown as red arrows. The right-sided panels show adjusted association results after conditioning for the respective risk alleles. In 
each of the five loci, conditioning on the disease SNP obviated signals in the entire region, such that no association more significant than p = 0.05 
remains. 

doi:1 0.1 371 /journal.pgen.1 004404.g003 



RA, T1D, or CeD conferred risk through modulation of CD4 + 
Tem cell abundance or proliferation. 

Gene expression in resting cells predicted post-TCR 
stimulation proliferation 

After stimulation we observed that 122 genes showed significant 
changes in expression in response to stimulation, including 78 
whose expression at least doubled or decreased by 50% (Table 
SI). The gene with the greatest post-stimulation induction was 
G^MB (average fold change = 93.48), which encodes granzyme B, 



a protein involved in the apoptosis of target cells during cell- 
mediated immune response in cytotoxic and memory lympho- 
cytes. The most significantly down-regulated gene was GRB10 
(average fold change = 0.18), which is near rs6944602 associated 
with T1D and encodes growth factor receptor-bound protein 10, 
whose function in the immune system is unclear. 

We observed that relative gene expression at rest predicted 
proliferative response. In 182 individuals with both proliferation 
and gene expression data, 1 7 of the 215 genes were associated 
with proliferation index (/><0.01, two-tailed test by permuting 
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Figure 4. Genome-wide association to division index (the 
average number of division undergone by all cells). A) 

rs389862 A on chromosome 13 was significantly associated to increased 
division index at p = 4.75x10~ 8 , and is located in a non-coding region 
30 kb upstream of RASA3, and 70 kb upstream from CDC16. B) 
rs3775500 G on ch romosome 4 shows a strongly suggestive association 
at p = 5.40x1 0~ 7 , and is located within the DAPP1 (Bam32) gene. 
doi:10.1371/journal.pgen.1004404.g004 
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proliferation data. Figure 6, Table SI). Increased expression of 
15 of the 17 genes including CCR5, IL2RB, PRR5L, and TBX21, 
were correlated with reduced proliferative response, while CCR9 
and the IncRNA XLOC_003479 showed significant correlation 
with increased proliferation. This number of correlated genes was 
far in excess of random chance based on a null distribution 
consisting of 1000 permutations (/><10 - ' , median 2, maximum 
15). The weighted sum of the 17 genes served as a "proliferation 
potential signature", where we weighted the positively- and 
negatively-correlated genes as +1 and —1, respectively. This 
signature strongly predicted proliferation index (r = 0.55). We 
show the correlation between each of the 1 7 genes as well as the 
aggregate signature to proliferation as a heatmap (Figure 6A). To 
assess if we were overfitting the data, we applied a two way cross- 
validation, where we defined the proliferation signature based on 
genes from half of the individuals and tested correlation to 
proliferation in the remaining half of the individuals. In both 
instances we again observed significant prediction of proliferation 
(r = 0.41,one tailed p<10~ :5 by permutation; r = 0.39, p<10~ 3 ). 

To search for biological pathways underlying genes correlated 
to proliferation, we applied gene set enrichment analysis (GSEA) 
to test for enrichment for 1 ,008 functional gene sets based on Gene 




0 0.5 1.0 1.5 2.0 2.5 0 0.5 1.0 1.5 2.0 2.5 



expected -log 10 (p) expected -log 10 (p) 

Figure 5. Risk alleles of CeD, RA, and T1D, showed no 
significant association to CD4 T EM cell abundance or prolifer- 
ation. A) The 118 SNPs with association to diseases in densely 
genotyped regions on Immunochip platform were not significantly 
associated to CD4 T EM cell abundance. The shaded region shows 95% 
confidence interval. See also Figure S5. B) The same set of 118 risk 
alleles also showed no inflation in association with proliferative 
response measured as division index. 
doi:10.1371/journal.pgen.1004404.g005 



— GO:0012502 (induction of programmed cell death) 

— GO:0002285 (lymphocyte activation involved in 

immune response) 




Figure 6. Relationship between baseline expression and post- 
stimulatory response. A) Baseline expression of 17 genes correlated 
with post-stimulation proliferation. Rows in the heatmap are ordered 
from top to bottom by ascending proliferation index. Genes/columns 
are ordered from the most negatively correlated (IL23RB) to the most 
positively correlated (CCR9). The 17-gene signature was calculated as 
the weighted sum of the 17 genes, where the negatively-correlated 
genes were given a weight of -1, and the positively-correlated genes 
were given a weight of +1. Table S1 lists the correlation coefficient and 
p-value for each gene. B) Genes correlated with proliferative response 
were enriched for apoptosis and lymphocyte activation pathways. 
Genes correlated to lower proliferative response (proliferative index) 
were enriched for Gene Ontology code GO:0012502 (induction of 
programmed cell death, p= 1.8x10~ 4 ). Conversely, genes correlated to 
higher proliferative response were enriched for GO:0002285 (lympho- 
cyte activation, p = 3.9x10" ). 
doi:10.1371/journal.pgen.1004404.g006 

Ontology codes [28] (Figure 6B). Genes correlated to reduced 
proliferation were most significantly enriched for GO:00 12502 
(induction of programmed cell death; one tailed p— 1.8x10 4 ); 
those correlated with increased proliferation were most signifi- 
candy enriched for GO:0002285 (lymphocyte activation involved 
in immune response, one tailed p= 3.9x10 j. 
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Using data from 29 individuals each with two samples collected 
at least one month apart, we replicated the observed correlation. 
In these samples we performed a cross-visit analysis, and observed 
that the same 17-gene signature from the first visit significantly 
predicted proliferation indices on the second visit (r=0.65, 
p — 0.0006, 1-tailed permutation), and vice versa (r= 0.55, p = 0.0019). 

Discussion 

To fine-map and link risk loci to their pathogenic mechanisms, 
we investigated molecular and immune phenotypes potentially 
leading to disease end-points. The immune system is particularly 
complex, and different cells under various activation states have 
specialized functions that may not be adequately captured by 
examining PBMCs. Therefore, we focused on one purified cell 
population that had been shown to be important for the patho- 
genesis of several autoimmune diseases. We quantified population 
variation in several traits, including peripheral abundance, 
proliferative response to TCR stimulation, and expression of 
genes within autoimmune disease loci at rest and after stimulation. 
In Tables 1, 2, and S2, we provide significant cis-eQTLs and 
genome-wide association results. 

To our knowledge this study was a first cross-examination of 
genetic-, transcriptional-, and cellular-level quantitative traits in 
CD4 + T EM cells. It demonstrated the importance of focusing 
functional studies in a purified cell population under relevant 
developmental and stimulation states. By examining the prolifer- 
ative response upon TCR stimulation, we identified a subset of 
genes whose baseline expression predicted proliferative potential. 
Intriguingly, these genes were involved in programmed cell death 
and lymphocyte activation. Whether variation in proliferative 
abilities correlated with cytokine production and other signaling 
functions, thus affecting susceptibility to autoimmunity, remains a 
question to be addressed by future studies. 

Of the 205 genes in disease loci that we examined, 46 had cis- 
eQTLs. Notably, eleven of these were specific to stimulated CD4 + 
Tem cells, and not previously found in PBMCs. We noted that 
approximately 10% of genes within risk loci of diseases had cis- 
eQTLs. However in many instances the lead eQTL SNPs were 
unrelated to the disease-associated SNPs. One example of a 
disease allele that functioned as eiv-eQTL was rs39984, which was 
associated to lower risk of RA, and regulated the expression of 
C5orJ30 encoding an UNCI 19-binding protein. This SNP variant 
is located in the first intron of C5orJ30, and indeed explained the 
entire m-eQTL signal in this gene (see Figure 2B). This eQTL 
effect was previously undetected in PBMCs, and the protein's 
functional role in the immune system is largely unknown. 
However, a recent study showed that rs26232 (the lead GWAS 
SNP prior to fine-mapping, r 2 = 0.988 to rs39984) was correlated 
with lower severity of radiologic damage in RA, independent of 
previously established biomarkers [29] . Another gene in the locus, 
GZ/V7, is located 140 kb from rs39984; however its expression 
showed no correlation with the SNP (/>>0.5). 

Another CD4 + T EM cell-specific eQTL gene was DDX6, which 
encodes DEAD-box RNA helicase 6. However, in this case, the 
lead eQTL SNP (rs4938544) associated to increased expression of 
DDX6 in stimulated cells was not in LD with the known CeD risk 
allele (rsl0892258, r 2 <0.1) or the RA risk allele (rs4938573, /< 
0.1). Neither risk allele showed significant association to DDX6 
expression (p — 0.19 and 0.26, respectively). Both risk alleles are 
also located near CXCR5, BCL9L, and TREH; none of these genes 
had reported cu-eQTLs in PBMCs [8]. However, we did not assay 
these three genes in this study, therefore could not confirm the role 
of disease alleles in regulating their expression in CD4 + T EM cells. 



Although we did not assay all genes or test for fraaf-acting 
eQTLs, based on the level of co-localization between eQTL SNPs 
and risk alleles observed in the study, we found it unlikely that all 
non-coding risk variants caused disease by altering gene expression 
within resting or stimulated CD4 + T EM cells. In addition, while 
changes in proportions of lymphocyte subsets had been observed 
in patients of autoimmune disorders [30-35], we did not find 
evidence to support disease alleles' roles in directly modulating 
CD4 + T EM cell abundance or proliferative response. Ultimately, 
other cell states and cell types will need to be investigated. 

We recognize several limitations to the current study. In order 
to conduct a focused study on a small amount of purified primary 
cells we used the NanoString nCounter assay system. This avoided 
potential biases and artifacts arising from cDNA synthesis required 
for microarray or RNA-seq studies, but restricted our analysis to a 
subset of candidate genes within risk loci of CeD, RA, and T1D, 
rather than a genome-wide expression analysis. Consequently we 
could not identify trans-eQTLs, splice variants, or epistatic effects 
on expression regulation. Additionally, anti-CD3/CD28 stimula- 
tion for memory T cells is not antigenic, especially while in 
isolation from a "natural" multi-cellular environment, thus it was 
only partially physiological. 

This and other cell-specific studies on population variation in 
molecular phenotypes are only a beginning of examining potential 
intermediate phenotypes. Post-activation cytokine production by 
CD4 + T EM cells are likely crucial in driving autoimmunity. 
Therefore, it is critical that future studies of molecular phenotypes 
include proteomic assays to quantify functional markers of immune 
response. Finally, functional experiments will need to be conducted 
in the future to determine whether these molecular phenotypes are 
indeed intermediary to disease. 

Materials and Methods 

Ethics statement 

All research was approved by our Institutional Review Board, 
and informed consent was obtained from each volunteer. 

Study sample 

We enrolled 225 healthy volunteers (134 females, 91 males) of 
non-Hispanic Caucasian descent that proved informed consent 
through the Phenogenetics Project at Brigham and Women's 
Hospital. Subjects' ages ranged from 19 to 57 years with average 
female and male ages of 28.8 years and 34.9 years, respectively. 
Thirty-five subjects (18 females, 17 males) returned for a second 
study visit one to nine months after their initial visits. 

Genotyping 

We genotyped each subject using the Illumina Infmium Human 
OmniExpress Exome BeadChip. In total, we genotyped 951,117 
SNPs, of which 704,808 SNPs are common variants (minor allele 
frequency |MAF]>0.01) and 246,229 are part of the exome. After 
quality control, 638,347 common SNPs remained. Of all subjects, 
174 subjects had abundance, proliferation, gene expression, and 
quality controlled genotype data. Detailed quality control criteria 
are described in Text SI. 

SNP imputation 

For each gene, we selected a 500 kb region (250 kb each in the 
3' and 5' directions) around the transcription start site and 
imputed 1000 Genomes SNPs into the genome-wide SNP data 
using BEAGLE Version 3.3.2. We used the European samples 
from 1 ,000 Genomes as the reference panel. We excluded markers 
that had MAF<0.05 in the reference panel as well as all insertion/ 
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deletions. After imputation, we excluded markers with a BEAGLE 
R 2 <0.4 or MAF<0.01 in the imputed samples. 

CD4 + T EM cell isolation and stimulation 

We isolated peripheral blood mononuclear cells (PBMC) from 
whole blood using a Ficoll density gradient (GE Healthcare). We 
then isolated CD4 + effector memory T cells from PBMCs first by 
magnetic-activated cell sorting to enrich for CD4 + T cells, followed 
by fluorescent-activated cell sorting using labeled antibodies 
against CD45RA, CD45RO, and CD62L. 

We stimulated CD4 + T EM cells by incubation with commercial 
anti-CD3/CD28 beads for 72 hours. For proliferation studies, we 
labeled cells with carboxyfluorescein diacetate succinimidyl ester 
(CFSE; eBioscience), and measured proliferation by dye dilution. 
Detailed isolation and purification methods are described in Text 
SI (also see Figure S2A). 

Gene expression 

We designed the NanoString codeset based on GWAS SNPs 
associated with CeD, RA, and T1D as of April 2012. This list of 
SNPs can be found in Supplementary Table S4. As the 

numbers of associated loci with autoimmune diseases continuously 
expand, we refer the reader to ImmunoBase (https://www. 
immunobase.org) for up-to-date disease regions. For each locus, 
we defined a region of interest implicated by the GWAS lead SNP 
[36]. We identified the furthest SNPs in LD in the 3' and 5' 
directions (r 2 >0.5). We then extended outward in each direction 
to the nearest recombination hotspot. If no genes were found in 
this region, we extended an additional 250 kb in each direction. 
All genes overlapping this region were considered implicated by 
the locus. The final NanoString codeset (prior to expression data 
quality control) included 312 genes, including 270 genes near 
SNPs associated with 157 RA, CeD, and T1D through GWAS, 26 
genes of immunological interest, and 15 reference genes with 
minimal change in expression after TCR stimulation (see 
Supplementary Table SI). 

After quality control, 215 genes remained. Of all 225 subjects in 
the study, 187 subjects passed gene expression quality control for 
both resting and stimulated cells. Specific normalization and 
quality control procedures are described in Text SI. 

Genotype principal component analysis 

To control for any potential population stratification, we 
adjusted all association tests using the top five principal 
components of our genome-wide SNP data. Principal components 
were generated via EIGENSTRAT using unsupervised analysis 
(no reference populations were used). The top five PCs explained 
6.88% (2.08%, 1.27%, 1.20%, 1.17%, and 1.16%, respectively) of 
the total variance. After controlling for these five PCs, the lambda 
GC for CD4 T EM proportion association was 1.008; that of 
division index was 1.001. 

Cis-eQTL analysis 

For each gene-SNP pair, we applied linear regression using the 
first five principal components of the genotype data and gender as 
covariates. As such, normalized expression = Po+P^allelic dosage+ 
P2*PC 1 +P3»PC2+P4»PC3+P5»PC 4 +P 6 *PC 5 +P7*(factor)gender. To 
adjust for multiple hypothesis testing while taking into consider- 
ation the correlation among SNPs within each locus, we calculated 
a permutation-based /J-value for each SNP. We performed 10,000 
permutations of the residual expression values. We reported each 
SNP's /(-value the proportion of permutation P value smaller than 
the analytical Rvalue. For conditional analysis, the vector of allelic 



dosages of the disease-associated SNP was included as an 
additional covariate. 

Quantification of CD4 + T EM cells and proliferative 
response 

We defined CD4 + T EM cells as CD45RA", CD45RO + , and 
CD62L 1 ™'". In all samples CD4 + 

Tem cells were quantified using 
X-Cyt, a mixture-modeling based clustering program for automat- 
ed cell population identification (see Figure S6) [19]. We fit 
proliferation division peaks with one-dimensional Gaussian 
mixture models (see Figure S7). Detailed protocol and algorithms 
are described in Text SI. 

Statistical analysis 

All linkage disequilibrium calculations (r 2 ) were based on 1000 
Genomes Release 3 European samples. All association tests were 
performed using Plink vl.07. We considered /><5xl0 8 to be 
genome-wide significant; p<5 x 10 5 was considered as suggestive. 
CD4 + T E m abundance and proliferation correlations with age and 
gender were calculated by multivariate linear model implemented 
in R-3.0. We calculated two-sample comparisons (CD4 + T EM cell- 
specific expression between genes, and H3K4me3 h/d scores 
between SNPs) with the Mann- Whitney test. Details of statistical 
analyses are described in Text SI. 

Data access 

We make all phenotypic data (expression, peripheral abundance, 
and proliferation) along with eQTL results publicly available online 
(http://immunogenomics.hms.harvard.edu/CD4eqtl.html). Ge- 
nome-wide genotype data will become available through dbGAP 
and through the ImmVar project. These data are potentially useful 
to investigators wishing to assess the potential of genetic variants in 
altering these molecular phenotypes. 

Supporting Information 

Figure SI Enrichment of cell-specific expression of genes within 
risk loci. As described in Hu et al. AJHG 201 1, A) genes within risk 
loci of RA were the most specifically expressed in CD4 + T E m cells 
ip = 1.00 x 10~ 8 ) followed by signal in regulatory T cells 
(p = 5.00 x 10~ 8 ). B) Genes within CeD were also the most strongly 
enriched in CD4 TEM ceUs (p= 1.43 xlO" 5 ) followed by 
regulatory T ceUs (p = 3.78 x 10" 3 ). C) In T1D, CD8 memory T 
cells showed the strongest enrichment (p — 2.26x10 '), followed 
by regulatory T cells (p = 5.13 x 10" 3 ) and CD4 + T EM cells 
(p= 1.29xl0" 4 ). 
(TIFF) 

Figure S2 A) Using a combination of magnetic and fluores- 
cence-activated cell sorting (MACS and FACS), CD4 + T cells were 
isolated to a high degree of purity. The isolated population 
contained -97% CD3 + cells, -90% CD4 + cells, -0.4% CD8 + 
cells, and —0.03% CD19 + cells. B) The relative abundance (as a 
percentage of all sorted lymphocytes), C) division index (average 
division of all cells), and D) proliferation index (average division of 
all cells that went into division), were reproducible in 35 
individuals with two blood draws at least one month apart. 
Pearson's r=0.87, 0.57, and 0.62, respectively. 
(TIFF) 

Figure S3 The 46 eQTL SNPs show more overlap with 
H3K4me4 marks. A) Cis-eQTL SNPs were located nearer 
H3K4me3 peaks in CD4 T EM cells than the 159 top SNPs that 
did not reach statistical significance at 5% FDR (p= 1.10x10 7 , 
one-sided Mann- Whitney test). B) The 46 of-eQTL SNPs were 
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near larger H3K4me4 peaks (peak height) and located at smaller 
distances to the summit of the peaks [p = 9.56 x 10 _!! , one-sided 
Mann- Whitney test). 
(TIFF) 

Figure S4 The relative abundance of CD4 T EM cells as the 
percentage of CD4 T cells. A) increased with age, at 0.11 % per 
year; and B) was correlated with gender, where men on average as 
2.2% more CD4 T EM cells than women. The associations 
remained significant in a multivariate linear regression. 
(TIFF) 

Figure S5 SNPs associated to CeD, RA, and T1D, showed no 
significant association to CD4 T E m cell abundance. The 119 risk 
alleles within densely genotyped loci showed no significant 
association to CD4 T EM abundance as a percentage of CD4 T 
cells in the study by Orru et al. The shaded area shows the 95% 
confidence interval. 
(TIFF) 

Figure S6 Quantification of CD4 T EM cells using X-Cyt. A) In 
each sample of enriched CD4 T lymphocytes, X-Cyt clustered all 
flow events based on fluorescence intensities in CD45RA, CD45RO, 
and CD62L simultaneously, using a seven-component multivariate 
Gaussian mixture-modeling. The T E m cell population is shown in 
red, defined as CD45RA", CD45RA + , and CD62L low/ ". B) X-Cyt 
clustered and quantified CD4 T EM cells in all samples (four random 
samples are shown here) in the study following the template in A). 
The T EM cell population is shown as the red cluster in each sample. 
In Sample 3, the subset of the black population residing in the lower 
left quadrant is the light green population identified as "debris" in 
Panel A); they are CD62L", CD45RA", and CD45RO". 
(TIF) 

Figure S7 The CFSE intensity peak present in the pooled 
resting wells for each subject (data underlying the green fitted 
curve) was modeled as a single Gaussian distribution. Its mean and 
variance were then used to initialize the location of the first 
component (undivided cells) and the variance of all components in 
each of the stimulated wells (data underlying the red fitted curve). 
The CFSE dilution peaks from stimulated wells were fitted using a 
one-dimensional mixture model of multiple Gaussian components 
of equal peak-to-peak distance and equal variance via a gradient 
descent optimization algorithm. A maximum of six components 
(five divisions) was fitted to each stimulated well; the weight of each 
component was allowed to be 0. 
(TIF) 
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